Conversation
Init Test Results 📝
|
Init Test Results 📝
|
Init Test Results 📝
|
There was a problem hiding this comment.
Pull request overview
Adds a new preprocessing path for SoccerTrackv2 / BePro event data and wires it into the existing soccer event preprocessing pipeline.
Changes:
- Added
UIED_beproevent preprocessing/feature-engineering routine. - Enabled
"bepro"as a supported provider inSoccer_event_data.preprocessing_single_df. - Adjusted SoccerTrack feature extraction to use
filtered_event_typesand tweaked local output filenames in__main__blocks; bumped package version.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pyproject.toml | Bumps package version to 0.1.44. |
| preprocessing/sports/event_data/soccer/soccer_processing.py | Introduces UIED_bepro and adds a __main__ example producing a preprocessed CSV. |
| preprocessing/sports/event_data/soccer/soccer_load_data.py | Changes SoccerTrack additional-feature extraction to read filtered_event_types; renames an output CSV in __main__. |
| preprocessing/sports/event_data/soccer/soccer_event_class.py | Adds "bepro" to preprocessing provider routing and updates the __main__ example to use it. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| soccer_track_tracking_path="/data_pool_1/soccertrackv2/2024-03-18/Tracking/tracking.xml" | ||
| soccer_track_meta_path="/data_pool_1/soccertrackv2/2024-03-18/Tracking/meta.xml" | ||
| df_soccertrack=Soccer_event_data('soccertrack',soccer_track_event_path, | ||
| df_soccertrack=Soccer_event_data('bepro',soccer_track_event_path, |
There was a problem hiding this comment.
In the __main__ example, the variable is named df_soccertrack but the provider passed is 'bepro'. Renaming the variable (or updating the provider name) would avoid confusion when running/debugging this script.
| df_soccertrack=Soccer_event_data('bepro',soccer_track_event_path, | |
| df_soccertrack=Soccer_event_data('soccertrack',soccer_track_event_path, |
| #round numerical columns to 4 decimal places (period, minute, second, X, Y) | ||
| df = df.round({"Period": 4, "Minute": 4, "Second": 4, "seconds": 4, "start_x": 4, "start_y": 4, "deltaX": 4, "deltaY": 4, "distance": 4, "dist2goal": 4, "angle2goal": 4}) | ||
|
|
||
| df['team'] = df.team_id | ||
| df['Period'] = df.period | ||
| df['Minute'] = df.seconds // 60 | ||
| df['Second'] = df.seconds % 60 | ||
|
|
There was a problem hiding this comment.
UIED_bepro calls df.round({..."Period", "Minute", "Second"...}) before those columns are created (they’re assigned just below). With pandas this can raise a KeyError for missing columns and will at least make the rounding logic ineffective. Create Period/Minute/Second first (or round the existing period/seconds columns), then round.
| #round numerical columns to 4 decimal places (period, minute, second, X, Y) | |
| df = df.round({"Period": 4, "Minute": 4, "Second": 4, "seconds": 4, "start_x": 4, "start_y": 4, "deltaX": 4, "deltaY": 4, "distance": 4, "dist2goal": 4, "angle2goal": 4}) | |
| df['team'] = df.team_id | |
| df['Period'] = df.period | |
| df['Minute'] = df.seconds // 60 | |
| df['Second'] = df.seconds % 60 | |
| df['team'] = df.team_id | |
| df['Period'] = df.period | |
| df['Minute'] = df.seconds // 60 | |
| df['Second'] = df.seconds % 60 | |
| #round numerical columns to 4 decimal places (period, minute, second, X, Y) | |
| df = df.round({"Period": 4, "Minute": 4, "Second": 4, "seconds": 4, "start_x": 4, "start_y": 4, "deltaX": 4, "deltaY": 4, "distance": 4, "dist2goal": 4, "angle2goal": 4}) |
| event_type_list = [] | ||
| for i in range(len(event_df)): | ||
| event_i = event_df.iloc[i].event_types | ||
| event_i = event_df.iloc[i].filtered_event_types |
There was a problem hiding this comment.
load_soccertrack.get_additional_features now reads event_df.iloc[i].filtered_event_types, but this column is not created anywhere in load_soccertrack (the event CSV is read as-is). If the input files only contain the previously used event_types column, this will raise an AttributeError. Consider falling back to event_types when filtered_event_types is missing, or ensure filtered_event_types is generated earlier in the loader.
| # Create 'action' column by concatenating 'event_type' and 'event_type_2' | ||
| df["success"]=df["event_type"].apply( | ||
| lambda x: 0 if ("Failed" in str(x) or "Missed" in str(x) or "OnTarget" in str(x) or "shot" in str(x)) else 1 | ||
| ) | ||
| df["action"] = df["event_type"] = ( | ||
| df["event_type"] | ||
| .astype(str) |
There was a problem hiding this comment.
The comment says the action column is created by concatenating event_type and event_type_2, but the implementation instead strips suffixes from event_type and assigns the result to both action and event_type. This mismatch is confusing for future maintenance; either update the comment/docstring to describe the actual behavior or implement the concatenation as described (and avoid overwriting the original event_type if it’s still needed).
| if df.action.iloc[i] not in all_cation: | ||
| print(f"Warning: action {df.action.iloc[i]} was not found in the all action list") |
There was a problem hiding this comment.
all_cation is a list but is used for membership checks inside a per-row loop (if df.action.iloc[i] not in all_cation:). Converting it to a set once (and using that for membership tests) will significantly reduce overhead on large matches.
This pull request introduces support for processing soccer event data from the "bepro" data provider, along with several related enhancements and minor fixes. The most significant changes are the addition of a new
UIED_beproprocessing function, integration of "bepro" into the preprocessing pipeline, and updates to test and output file naming.Bepro Data Provider Integration:
UIED_beprotosoccer_processing.pyfor processing and feature engineering of Bepro event data, including possession tracking, action grouping, goal detection, and various time/location-based features.Soccer_event_data.preprocessing_single_df, calling the newUIED_beprofunction when appropriate.process_single_matchto use the "bepro" provider for soccer event data.Testing and Output Improvements:
soccer_processing.pyto process a Bepro CSV file and save the preprocessed output for verification.test_load_function_sync.csvtoload.csvfor clarity and consistency.Feature and Data Handling Adjustments:
get_additional_featuresto use thefiltered_event_typescolumn instead ofevent_types, likely improving event type accuracy.Versioning:
pyproject.toml.